Enterprise Search

Advanced AEM Search: Consuming External Content and Enriching Content with Apache Camel

I had the pleasure of speaking at CIRCUIT 2016 on a new architecture for indexing AEM content and external content using ActiveMQ, Apache Camel and Solr. My slides are available on SlideShare. The demo code for indexing both products and AEM content are available on GitHub

Initially, I had planned for a rather ambitious demo, but ran out of time during the talk. As such, I recorded a fairly lengthy video which is available on Youtube or inline below.

A big thank you to all the attendees and conference coordinators!

Solr Document Processing with Apache Camel - Part III

For those of you that are still following along, let's recap what we've accomplished since the last post, Solr Document Processing with Apache Camel - Part II. We started by deploying SolrCloud with the sample gettingstarted collection and then developed a very simple standalone Camel application to index products from a handful of stub JSON files.

In this post, we will continue to work against the SolrCloud cluster we set up previously. If you haven't done this, refer to the Apache Solr Setup in README.md. We will also start out with a new Maven project available in GitHub called camel-dp-part3. This project will be similar to the last version; but with the following changes:

  1. We will be using a real data source. Specifically, Best Buy's movie product line.
  2. We will introduce property placeholders. This will allow us to specify environment-specific configurations within a Java properties file.
Read More

Solr Document Processing with Apache Camel - Part II

In my last post, Solr Document Processing with Apache Camel - Part 1, I made the case for using Apache Camel as a document processing platform. In this article, our objective is to create a simple Apache Camel standalone application for ingesting products into Solr. While this example may seem a bit contrived, it is intended to provide a foundation for future articles in the series.

Our roadmap for today is as follows:

  1. Set up Solr
  2. Create a Camel Application
  3. Index sample products into Solr via Camel
Read More

AEM Solr Search 2.0.0

We just released AEM Solr Search 2.0.0. This is the first major release since I gave my talk at adaptTo(). Checkout the following links to get you started:

  1. AEM Solr Search Wiki. This is the new source of truth for our documentation.
  2. AEM Solr Search 2.0.0 demo on Youtube.

We hope you enjoy the new release. Drop us a line at aemsolr@headwire.com if you need help with your AEM / Solr integration.

In our next release, expect document processing support and integration with a product catalog. This release will coincide with my talk at CIRCUIT in July:  Advanced AEM Search - Consuming External Content and Enriching Content with Apache Camel.

Supporting Multi-Term Synonyms in hybris 5.4 / Solr 4.6.1

Supporting Multi-Term Synonyms in hybris 5.4 / Solr 4.6.1

Recently, I was working with a client on a hybris 5.4 implementation and was asked to import their synonyms from their current platform. Easy enough, right? Wrong. The out-of-the-box synonym integration allows business users to define multi-term synonyms on the "from-side” in the hMC; however, at the time of this writing, Sol 4.x does not natively support multi-term synonyms on the from-side of the synonym mapping. For example, if we had a synonym on the from-side (i.e., classic gaming console) mapped to the "to-side" (i.e., nintendo entertainment system), the hMC would silently allow this synonym definition. However, this would have no effect on the Solr-side.

Read More

AEM Solr Search Now Available

AEM Solr Search Now Available

I am happy to announce that AEM Solr Search is finally out in Beta! Visit http://www.aemsolrsearch.com/ and start integrating AEM with Apache Solr. Watch the video for a quick preview on building a rapid front-end search experience, then jump into the Getting Started guide and experiment with the Geometrixx Media Sample application.

Read More

adaptTo() 2014 - Integrating Open Source Search with CQ/AEM

I just received confirmation that I will be speaking at adaptTo() 2014. This session describes several approaches for integrating Apache Solr with AEM. It starts with an introduction to various pull and push indexing strategies (e.g., Sling Eventing, content publishing and web crawling). The topic of content ingestion is followed by an approach for delivering rapid search front-end experiences using AEM Solr Search. 

A quick start implementation of the search stack will be provided as part of this presentation. The quick start installer includes pre-configured instances of Apache Solr and Apache Nutch. This presentation will also include the source code for the Community Edition of headwire.com’s AEM Solr Search. AEM Solr Search is a suite of AEM search components and services designed to integrate with Apache Solr. 

There will be a hackathon session afterwards, so it would be great to see you in person.

Integrating Apache Solr with Adobe CQ / AEM

Recently, I have been noticing a bit of interest by the CQ community regarding CQ / Solr integration. However, as most people have pointed out, there isn't a clear path detailed anywhere. Given the interest, I will be posting regularly on the subject. This first post will stay relatively high-level and discuss the possible integration points.

There are really two areas that should be considered when integrating Solr with CQ: indexing content and searching content. For the most part, you can treat these as two independent efforts.

Indexing CQ Content

Over the past 6 months I have experimented with multiple approaches to indexing CQ content in Solr. Each approach has its respective strengths and weaknesses.

  1. Crawl your site using an external crawler.
  2. Create one or more CQ servlets to serialize your content into a Solr JSON or Solr XML Update format.
  3. Create an observer within CQ to listen for page modifications and trigger indexing operations to Solr.

Using an External Crawler

Using an external crawler such as Nutch or Heritrix is perhaps the simplest way to start indexing your CQ content; however, it does have its drawbacks. Using a crawler involves working with unstructured content in the form of mainly HTML documents. While most crawlers do a decent job extracting the content body, title, url, description, keywords and other metadata, you typically need to define a strategy for extracting other useful data points to drive functionality such as faceting. Extracting this information can be achieved in several ways: use an external document processing framework (recommended), use Solr's Update Request Processor (not recommended), use Solr's tokenizers for basic extraction, etc.

The other drawback with this approach is that it uses a pull approach to indexing content. There are ways around this; however, using a crawler typically means that you will be sacrificing real-time indexing.

CQ Servlets & Solr Update JSON/XML

Another possible approach is to create one or more CQ servlets that produces a dump of your CQ content using Solr's Update JSON or Update XML format. The advantage here is that you are working with structured content and have full access to CQ's APIs for querying JCR content. An external cron job can then be used to fetch this page using curl and post it to Solr.

A variation of this approach is to use a selector to render a page in either the Solr JSON or XML update format. 

CQ Observer

Using a CQ observer provides the tightest integration with Solr and as such provides real-time indexing capabilities. Like the CQ Servlet approach, it simplifies content extraction since you are working with structured data. There are several methods for implementing an observer. Refer to Event Handling in CQ by Nicolas Peltier. My personal preference is listening to Page Events and Replication Events using Sling Eventing. In this approach once you receive an event, such as page modification, you can use the SolrJ API to update the Solr index.

Searching CQ

Once you have your CQ content indexed in Solr you will need a search interface. While there are several approaches for building search experiences against Solr, the most popular approach is to use Solr's Java API, SolrJ. For client-side integration, ajax-solr is a great choice.

Lastly, I need to shamelessly plug an upcoming integration for CQ and Solr by headwire.com, Inc, aptly named CQ Solr Search. This integration offers support for building search interfaces using search components built on ajax-solr as well as a configurable CQ observer for real-time Solr indexing. We will be introducing the first public implementation on CQ Blueprints. Our intent is to provide one place for searching all CQ/Sling/JCR content on the web.

Upcoming

Based on the community feedback, please stay tuned for the following. 

  1. CQ Solr Search by headwire.com, Inc. - (Not yet available)
  2. A Step-by-Step Guide to Indexing CQ with Nutch (Coming soon)
  3. A Steb-by-Step Guide to Indexing CQ with CQ Servlets (Coming soon)
  4. A Step-by-Step Guide to Indexing CQ using an Observer (Coming soon)